在全球范围内消除语言障碍的目标的驱动下,机器翻译已巩固自己是当今人工智能研究的关键重点。但是,这样的努力围绕着一小部分语言结合在一起,留下了绝大多数低资源的语言。在确保安全,高质量的结果的同时,在牢记道德考虑的同时,打破200个语言障碍需要什么?没有留下的语言,我们首先通过与母语人士的探索性访谈来解决对低资源语言翻译支持的必要性来应对这一挑战。然后,我们创建了旨在缩小低资源和高资源语言之间的性能差距的数据集和模型。更具体地说,我们开发了一种有条件的计算模型,基于专家的稀疏混合物,该模型经过针对针对低资源语言量身定制的新颖有效的数据挖掘技术培训的。我们提出了多次建筑和培训改进,以抵消数千个任务的培训。至关重要的是,我们使用人类翻译的基准,Flores-200评估了40,000多种不同的翻译方向的性能,并将人类评估与新型毒性基准相结合,涵盖Flores-200的所有语言,以评估翻译安全性。我们的模型相对于先前的最新技术,实现了44%BLEU的改善,为实现通用翻译系统奠定了重要的基础。最后,我们开源此工作中描述的所有贡献,可在https://github.com/facebookresearch/fairseq/tree/nllb上访问。
translated by 谷歌翻译
最先进的编码器模型(例如,用于机器翻译(MT)或语音识别(ASR))作为原子单元构造并端到端训练。没有其他模型的任何组件都无法(重新)使用。我们描述了Legonn,这是一种使用解码器模块构建编码器架构的过程,可以在各种MT和ASR任务中重复使用,而无需进行任何微调。为了实现可重复性,每个编码器和解码器模块之间的界面都基于模型设计器预先定义的离散词汇,将其接地到边缘分布序列。我们提出了两种摄入这些边缘的方法。一个是可区分的,可以使整个网络的梯度流动,另一个是梯度分离的。为了使MT任务之间的解码器模块的可移植性用于不同的源语言和其他任务(例如ASR),我们引入了一种模态不可思议的编码器,该模态编码器由长度控制机制组成,以动态调整编码器的输出长度,以匹配预期的输入长度范围的范围预训练的解码器。我们提出了几项实验来证明Legonn模型的有效性:可以重复使用德国英语(DE-EN)MT任务的训练有素的语言解码器模块,而没有对Europarl English ASR和ROMANIAN-ENGLISH进行微调(RO)(RO)(RO)(RO) -en)MT任务以匹配或击败相应的基线模型。当针对数千个更新的目标任务进行微调时,我们的Legonn模型将RO-EN MT任务提高了1.5个BLEU点,并为Europarl ASR任务降低了12.5%的相对减少。此外,为了显示其可扩展性,我们从三个模块中构成了一个legonn ASR模型 - 每个模块都在三个不同数据集的不同端到端训练的模型中学习 - 将降低的减少降低到19.5%。
translated by 谷歌翻译
Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dualencoder framework. When evaluated on a wide range of open-domain QA datasets, our dense retriever outperforms a strong Lucene-BM25 system greatly by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks. 1 * Equal contribution 1 The code and trained models have been released at https://github.com/facebookresearch/DPR.
translated by 谷歌翻译
This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART -a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective . mBART is the first method for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task-specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU points for low resource MT and over 5 BLEU points for many document-level and unsupervised models. We also show it also enables new types of transfer to language pairs with no bi-text or that were not in the pre-training corpus, and present extensive analysis of which factors contribute the most to effective pre-training.
translated by 谷歌翻译
FAIRSEQ is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. The toolkit is based on PyTorch and supports distributed training across multiple GPUs and machines. We also support fast mixed-precision training and inference on modern GPUs. A demo video can be found here: https://www.youtube. com/watch?v=OtgDdWtHvto.
translated by 谷歌翻译
Nowadays, feature selection is frequently used in machine learning when there is a risk of performance degradation due to overfitting or when computational resources are limited. During the feature selection process, the subset of features that are most relevant and least redundant is chosen. In recent years, it has become clear that, in addition to relevance and redundancy, features' complementarity must be considered. Informally, if the features are weak predictors of the target variable separately and strong predictors when combined, then they are complementary. It is demonstrated in this paper that the synergistic effect of complementary features mutually amplifying each other in the construction of two-tier decision trees can be interfered with by another feature, resulting in a decrease in performance. It is demonstrated using cross-validation on both synthetic and real datasets, regression and classification, that removing or eliminating the interfering feature can improve performance by up to 24 times. It has also been discovered that the lesser the domain is learned, the greater the increase in performance. More formally, it is demonstrated that there is a statistically significant negative rank correlation between performance on the dataset prior to the elimination of the interfering feature and performance growth after the elimination of the interfering feature. It is concluded that this broadens the scope of feature selection methods for cases where data and computational resources are sufficient.
translated by 谷歌翻译
Determining and predicting reservoir formation properties for newly drilled wells represents a significant challenge. One of the variations of these properties evaluation is well-interval similarity. Many methodologies for similarity learning exist: from rule-based approaches to deep neural networks. Recently, articles adopted, e.g. recurrent neural networks to build a similarity model as we deal with sequential data. Such an approach suffers from short-term memory, as it pays more attention to the end of a sequence. Neural network with Transformer architecture instead cast their attention over all sequences to make a decision. To make them more efficient in terms of computational time, we introduce a limited attention mechanism similar to Informer and Performer architectures. We conduct experiments on open datasets with more than 20 wells making our experiments reliable and suitable for industrial usage. The best results were obtained with our adaptation of the Informer variant of Transformer with ROC AUC 0.982. It outperforms classical approaches with ROC AUC 0.824, Recurrent neural networks with ROC AUC 0.934 and straightforward usage of Transformers with ROC AUC 0.961.
translated by 谷歌翻译
Reinforcement learning (RL) problems can be challenging without well-shaped rewards. Prior work on provably efficient RL methods generally proposes to address this issue with dedicated exploration strategies. However, another way to tackle this challenge is to reformulate it as a multi-task RL problem, where the task space contains not only the challenging task of interest but also easier tasks that implicitly function as a curriculum. Such a reformulation opens up the possibility of running existing multi-task RL methods as a more efficient alternative to solving a single challenging task from scratch. In this work, we provide a theoretical framework that reformulates a single-task RL problem as a multi-task RL problem defined by a curriculum. Under mild regularity conditions on the curriculum, we show that sequentially solving each task in the multi-task RL problem is more computationally efficient than solving the original single-task problem, without any explicit exploration bonuses or other exploration strategies. We also show that our theoretical insights can be translated into an effective practical learning algorithm that can accelerate curriculum learning on simulated robotic tasks.
translated by 谷歌翻译
Existing 3D-aware image synthesis approaches mainly focus on generating a single canonical object and show limited capacity in composing a complex scene containing a variety of objects. This work presents DisCoScene: a 3Daware generative model for high-quality and controllable scene synthesis. The key ingredient of our method is a very abstract object-level representation (i.e., 3D bounding boxes without semantic annotation) as the scene layout prior, which is simple to obtain, general to describe various scene contents, and yet informative to disentangle objects and background. Moreover, it serves as an intuitive user control for scene editing. Based on such a prior, the proposed model spatially disentangles the whole scene into object-centric generative radiance fields by learning on only 2D images with the global-local discrimination. Our model obtains the generation fidelity and editing flexibility of individual objects while being able to efficiently compose objects and the background into a complete scene. We demonstrate state-of-the-art performance on many scene datasets, including the challenging Waymo outdoor dataset. Project page: https://snap-research.github.io/discoscene/
translated by 谷歌翻译
Imitation learning (IL) is a simple and powerful way to use high-quality human driving data, which can be collected at scale, to identify driving preferences and produce human-like behavior. However, policies based on imitation learning alone often fail to sufficiently account for safety and reliability concerns. In this paper, we show how imitation learning combined with reinforcement learning using simple rewards can substantially improve the safety and reliability of driving policies over those learned from imitation alone. In particular, we use a combination of imitation and reinforcement learning to train a policy on over 100k miles of urban driving data, and measure its effectiveness in test scenarios grouped by different levels of collision risk. To our knowledge, this is the first application of a combined imitation and reinforcement learning approach in autonomous driving that utilizes large amounts of real-world human driving data.
translated by 谷歌翻译